import pandas as pdWorking with MultiIndex DataFrames
Working with MultiIndex DataFrames
This notebook covers MultiIndex DataFrames in pandas: creation, inspection, and combining with NumPy arrays. MultiIndex allows hierarchical indexing for complex data structures.
Introduction to MultiIndex
MultiIndex in pandas allows you to have multiple levels of indexing on rows or columns. It’s useful for hierarchical data like time series with multiple categories.
Creating a MultiIndex DataFrame
Use pd.MultiIndex.from_arrays() to create a MultiIndex from arrays. Here, we create a DataFrame with a two-level row index.
arrays = [['A','A','B','B',], [1,2,1,2]]
index = pd.MultiIndex.from_arrays(arrays, names=('First', 'Second'))
df = pd.DataFrame({'Data': [10,20,30,40]}, index=index)
df| Data | ||
|---|---|---|
| First | Second | |
| A | 1 | 10 |
| 2 | 20 | |
| B | 1 | 30 |
| 2 | 40 |
Inspecting the MultiIndex
Access the index with df.index. It shows the hierarchical structure.
df.indexMultiIndex([('A', 1),
('A', 2),
('B', 1),
('B', 2)],
names=['First', 'Second'])
Combining pandas with NumPy
Pandas integrates seamlessly with NumPy. You can create DataFrames from NumPy arrays and use NumPy functions on DataFrame data.
Creating DataFrames from NumPy Arrays
Use pd.DataFrame() with a NumPy array to create a DataFrame. Specify column names for clarity.
import numpy as nparray_data = np.array([[1,2,3], [4,5,6], [7,8,9]])
df_np = pd.DataFrame(array_data, columns=['Sales A', 'Sales B', 'Sales C'])
df_np| Sales A | Sales B | Sales C | |
|---|---|---|---|
| 0 | 1 | 2 | 3 |
| 1 | 4 | 5 | 6 |
| 2 | 7 | 8 | 9 |
Best Practices
- Use meaningful names for MultiIndex levels (e.g., ‘Category’, ‘Subcategory’).
- When selecting data, use
.loc[]with tuples for MultiIndex access. - Reset index with
df.reset_index()if you need to flatten the hierarchy.
Summary
This notebook demonstrated creating and inspecting MultiIndex DataFrames and integrating pandas with NumPy arrays. MultiIndex is powerful for complex data but can be tricky—practice with real datasets!